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Abstract. A current problem of practical significance is the analysis of large, 
spatially distributed, environmental data sets. The problem is more challenging for 
variables that follow non-Gaussian distributions. We show by means of numerical 
| simulations that the spatial correlations between variables can be captured by 

qq ■ interactions between "spins" . The spins represent multilevel discretizations of 

environmental variables with respect to a number of pre-defined thresholds. The spatial 
dependence between the "spins" is imposed by means of short-range interactions. 
We present two approaches, inspired by the Ising and Potts models, that generate 
conditional simulations of spatially distributed variables from samples with missing 



ON 

qq ' data. Currently, the sampling and simulation points are assumed to be at the nodes 

C^) . of a regular grid. The conditional simulations of the "spin system" are forced to 

respect locally the sample values and the system statistics globally. The second 
constraint is enforced by minimizing a cost function representing the deviation between 
normalized correlation energies of the simulated and the sample distributions. In the 
approach based on the N c — state Potts model, each point is assigned to one of N c 
classes. The interactions involve all the points simultaneously. In the Ising model 
approach, a sequential simulation scheme is used: the discretization at each simulation 
level is binomial (i.e., ±1.) Information propagates from lower to higher levels as 
the simulation proceeds. We compare the two approaches in terms of their ability to 
reproduce the target statistics (e.g., the histogram and the variogram of the sample 
distribution), to predict data at unsampled locations, as well as in terms of their 
computational complexity. The comparison is based on a non-Gaussian data set 
(derived from a digital elevation model of the Walker lake area, Nevada, USA). We 
discuss the impact of relevant simulation parameters, such as the domain size, the 
number of discretization levels, and the initial conditions. 
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1. Introduction 

Spatially distributed data are common in the physical sciences. They represent various 
environmental variables, such as contaminant concentrations in the atmosphere, soil 
permeability and dispersivity, etc. To account for the spatial variation and the 
measurement uncertainty of such quantities, the mathematical model of spatial random 
fields is commonly used. To date, huge numbers of large spatial data sets are gathered 
from around the globe on a daily basis. The efficient storage, analysis, harmonization, 
and integration of such data is of great importance in various scientific areas such as 
image processing, pattern recognition, remote sensing, and environmental monitoring. 
Often the data coverage is incomplete for various reasons, such as meteorological 
conditions (measurements hindered by clouds, aerosols, or heavy precipitation) or 
equipment limitations (values below detection level or resolution threshold). Hence, 
in order to use standard tools for the data analysis and visualization, one needs to deal 
with the problem of incomplete (missing) data. In addition, data sampled at different 
resolutions may need to be combined (e.g., in data fusion methods). This requires 
down-scaling (refining) of the data with the coarser resolution. 

These tasks can be performed by means of well established interpolation and 
classification techniques pp. While deterministic methods (e.g., nearest-neighbor or 
inverse distance interpolation) can be used, stochastic methods are often preferred 
because they are more flexible in incorporating spatial correlations and and they provide 
estimates of prediction uncertainty. However, considering the ever- increasing size of 
spatial data, stochastic methods such as kriging [2], can be impractical due to their 
high computational complexity, requiring the use of high performance computational 
technologies [3]. Furthermore, kriging is founded on the assumption of a jointly 
Gaussian distribution, which is often failed by the data. Practical application of kriging 
involves considerable human (subjective) input, regarding the selection of the correlation 
(variogram) model and the kriging neighborhood selection [I]. 

The classical geostatistical approach relies on the structure function (variogram) 
for modeling the spatial correlations. However, the correlations can also be considered 
as the outcome of "interactions" between the field values at different points [5] that 
generate short-range spatial order. In the case of static disorder, which is typical 
in case of "quenched" geological disorder or measurements representing a single time 
slice of a dynamical process, such interactions represent "effective constraints" imposed 
by the underlying process. For example, in the case of a digital elevation model, 
these constraints are imposed by the topography of the area. While the nature of 
the constraints may differ significantly between processes, we believe that universal 
aspects of spatial correlations can be captured by means of relatively simple effective 
interactions. By incorporating the interactions in an energy functional a Gibbs 
probability measure can be defined as in [5]. The realization probability of each 
spatial configuration is then governed by the relative weight of the "many body" Gibbs 
probability density function. In the Gaussian case it is straightforward to define the 
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interactions between the random field values at neighboring locations. On a regular 
lattice this approach defines Gauss-Markov random fields. 

In the non-Gaussian case, it is convenient to discretize the continuous values of the 
random field using a set of discrete "spins". We arbitrarily define higher spin values 
to represent higher values of the field. In the spirit described above, one can then 
consider interactions between different "spins". In this framework, the problems of 
spatial interpolation and simulation are mapped into a spatial classification problem, 
in which each "prediction" location is assigned to a specific spin value (class). The 
concept of classical spins is a suitable choice for modeling the multilevel discretization 
of the continuous field. Spin models from statistical physics, e.g., the Ising and Potts 
models, have already been applied in various problems in economy and finance [HI OH], 
materials science 0, [TUJ [TT], and biology [12]. However, these studies focus mainly on 
long-range correlations. In the framework of Gibbs-Markov random fields, the short- 
range correlation properties of Potts and Ising models have been widely applied in 
image analysis (see e.g. [TBI [HI [15], [TBI E])- The Potts model in super-paramagnetic 
regime was also proposed as a data clustering model [18]. We introduce here two non- 
parametric models for spatial classification that are loosely based on the Ising and 
Potts spin models. The first model employs a sequential classification approach while 
the second a simultaneous (parallel) classification of all levels. 

The rest of the paper is organized as follows. In Section [2j we present the problem 
of spatial classification/prediction, and we review the commonly used classification 
algorithm of fc-nearest neighbors. Section [3] briefly reviews the Ising and Potts models 
and introduces the "spin" nearest-neighbor correlation models that we propose for 
spatial classification. In Section [4] we investigate technical aspects of the simulations, 
and we describe computational details. Section [5] focuses on the case study: it presents 
computational details as well as the analysis of the simulation results. Finally, in Section 
El we summarize the relevant results and point out some future directions. 

2. Spatial prediction and classification 

Let us consider a set of sampling points G s = where r*j = (xi,yi) 6 M 2 and 

i = 1,...,N. These points are assumed to be scattered on a rectangular grid G of 
size Nq = L x L y > N, where L x and L y are respectively the number of nodes in the 
orthogonal directions (in terms of the unit length). If Zi is a value attributed to the 
point fi, the set Z{G S } = {zi G M} represents the sample of the process. Here we 
assume that Z{G S } represents a sample from a realization of a continuous random field 
Z(r; u), where u is the state index. 

Let G p = {f p } be the set of prediction points where p = 1, . . . , P, such that 
G = G s U G p . We discretize the continuous distribution of Z using a number of 
classes, C q , q = 1, . . . , N c . The classes are defined with respect to a set of threshold 
levels t k , k = 1, . . . , N c + 1. If z min = minfo, ...,z N ), ^ max = max(z 1 , ...,z N ), 
St = (z mauX —z m i n )/N c , the thresholds are defined as: t\ = — oo, t 2 = z m i n +5t, t. t = U-i+St 
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(i = 3, . . . , N c ) and t Nc+ \ = oo. Each class C q corresponds to the interval C q = (t q , t q+1 ] 
for q = 1, . . . , N c . This means that all the classes, except the first and the last, have 
a uniform width. The classes C\ and Cn c extend to infinity (negative and positive 
respectively), to allow for values at the prediction points that lie outside the observed 
interval [z m m , Zm^} ■ We define the class indicator field (phase field) Izif) by means of 



where 8(x) = 1 for x > and 8(x) =0 for x < is the unit step function. Prediction 
of the field values Z{G P } is mapped onto a classification problem, i.e., into estimating 
Iz{G p }. If the number of levels is high, the continuum limit is approached. 

2.1. k-nearest neighbor (k-NN) classifier 

The fc-NN method is a supervised learning algorithm, which assigns to each prediction 
point the class represented by the majority of its k nearest neighbors from the sample 
[T9] . The distance metric used is typically Euclidean. The optimal value of the 
parameter k depends on the data and requires tuning for different applications. The 
optimization of k usually involves heuristic techniques (e.g. cross-validation). The 
accuracy of the fc-NN classifier is affected by noise or by selecting a neighborhood that 
does not include the pertinent spatial information. The fc-NN algorithm has been widely 
applied in the field of data mining, statistical pattern recognition, image processing 
and many others. In general, it has been found to outperform many other flexible 
nonlinear methods, particularly in high- dimensional spaces [20]. Furthermore, it is 
relatively easy to implement and has desirable consistency properties, e.g., favorable 
asymptotic classification error dependence. 

We use the fc-NN classifier as a benchmark for the "spin-based" classification 
methods we propose. To eliminate the effect of k on the classification results, for 
each simulated realization we perform the classification for a wide range of values of 
k = 1, . . . , fcmax; and select the value k opt that minimizes the misclassification rate. 

3. Spatial Classification based on "spin" correlation models 

We propose two non-parametric, nearest-neighbor (NN), multilevel correlation (MLC) 
models that are loosely based on the Ising and Potts models. The NN-MLC models are 
based on matching suitably normalized correlation energy functions calculated from the 
samples with those estimated over the entire prediction grid. Similar ideas of correlation 
energy matching were also applied in the reconstruction of digitized random media from 
limited morphological information [211 122] . In contrast with those studies, in the NN- 
MLC models we also use local spatial information (i.e., the values at the sampling 
points). The prediction of the field values (more precisely, class) at unsampled locations 
is achieved by means of conditional simulations that respect locally the sample values 
and the correlation energy globally. 




(1) 



q=l 
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3.1. Ising model 

The Ising model (e.g. [23J) involves discrete variables s« (spins) placed on a sampling 
grid. Each spin can take two values, ±1, and the spins interact in pairs. Assuming 
only nearest-neighbor interactions, the energy of the system can be expressed by the 
following Hamiltonian: 



The coupling strength controls the type (ferromagnetic for J^ > 0, antiferromagnetic 
for Jij < 0) and strength of the interactions. The second term introduces a symmetry- 
breaking bias caused by the presence of a site dependent external field h%. The latter 
controls the mean spin value (the magnetization). The model is usually defined on a 
regular grid, the interactions are considered uniform and their range limited to nearest 
neighbors. However, the model can be generalized to include also irregular grids and 
longer-range interactions. 

3.2. Potts model 

The Potts model is a generalization of the Ising model (e.g. [211 )■ Instead of ±1, each 
spin is assigned an integer value Si G {1, . . . , N c }, where N c represents the total number 
of states or classes. The Hamiltonian of the Potts model is given by 



where 5 is the Kronecker delta. Hence, only pairs of spins in the same state give a non- 
zero contribution to the correlation energy. For N c = 2, the Potts model is equivalent 
to the 2D Ising model. 

3.3. General Assumptions for NN-MLC Classifiers 

The states (configurations) of both the Ising and Potts models are determined by the 
Gibbs probability density function / = Z^ 1 exp(— iT/fceT), where H is the respective 
Hamiltonian, k% is the Boltzmann's constant, and T is the temperature. The partition 
function Z is obtained by summing e~ H//fcsT over all possible spin configurations. 
Essentially, the Hamiltonian involves two parameters, i.e. the normalized interaction 
couplings Jij = JijjksT and hi = hijk^T. In the case of spatial classification, the 
parameters are not known a priori and need to be determined from the sample. The 
standard procedure for this inverse problem entails using the maximum likelihood 
method. Assuming that the parameters can be inferred, the spin values at unsampled 
locations are predicted by maximizing the conditional probability density function 
/ {l z{G p\\I z{G s\\ J^, hi j . However, optimizing the likelihood of the models with 

respect to the coupling parameters Jij, hi is a computationally intensive task, since 
there are no generally valid closed-form expressions for the partition function. In order 
to overcome this problem we opted for a non-parametric approach. 
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Our NN-MLC models retain only the interaction energies of the Ising and Potts 
spin Hamiltonians. The sample values Z{G S } are mapped into spin values by suitable 
discretization (as shown below). The main idea in both methods is to match the 
sample correlation energy with the correlation energy of all the spins (i.e. including 
the simulated spins at prediction points). This relies on the ergodic assumption that 
the sample spin correlation energies accurately describe those of the entire system. 
The matching of the correlation energies is performed by means of a numerical Monte 
Carlo approach. During the process, the spins at the sample sites remain fixed. 
Hence, the procedure followed is conditional simulation, as opposed to deterministic 
prediction. Focusing on regular grids and assuming isotropic and nearest-neighbor "spin 
interactions," it is reasonable to set = J if the points fj, fj are lattice neighbors 
and Jij = otherwise. Furthermore, we set hi — (i — 1, . . . , N). This choice prevents 
explicit control of the mean spin. Nevertheless, as shown by the simulation results, 
judicious choice of the initial state allows the distribution of the predicted classes to 
accurately recover the class distribution of the sample. This is due to the fact that both 
the correlation energies and the local spin values are restricted by the sample. 



34. Ising-based NN-MLC model (I-NN-MLC) 

We propose a sequential scheme in which the sample, G q s and prediction, G q grids are 
sequentially updated. Each simulation level q corresponds to a class index. The number 
of simulation levels coincides with that of discretization levels. For the lowest level 
G\ = G s , Gp — G p . For q — 1, . . . ,N C - 1 it holds that G% C G q s +1 , G q p D G q p +1 and 
G = G q U G q . Binary- valued spins are used at each level q(q = 1, . . . , N c ). Sites (either 
from the sample or simulated) having Iz < q are assigned a spin value of —1, while 
sites having Iz > q are assigned a spin value of 1. All the sites that are assigned a spin 
— 1 value at level q retain this value at higher levels. This means that the areas of low 
values are classified first. Once a site r*j is assigned a spin —1 value at level q < N c , 
it is also assigned class value -Tz(n) = <?• At the same time, the set G q s +1 acquires all 
the —1 points while the set G q p +1 is accordingly reduced. In contrast, for sites that are 
assigned spin 1 value Izifi) > Q, and the precise class value Izifi) is determined at a 
higher level. At level q = N c all remaining sites are assigned to the iV c -th class. 

Let S q = {s*; Mi s.t. r*j e G q }, Mq = 1, . . . , N c , be the set of spin values included in 
the "sample" at level q. These values are ±1 (depending on Zi) if r*j e G s and —1 if 
is an already classified prediction point. The unknown values at level q are the spins at 
the remaining locations, denoted by S q . 

Our non-parametric approach utilizes a cost function, Ui(S q \S q ), that describes 
the deviation between the normalized correlation energies of the simulated spin 
configuration, Cf, and its sample counterpart, Cf. s estimated from the spins in G q . 

"l - C?(Sl S q )/Cl(S q )} 2 , for CI + 0, 
V Y {Sl\Sft = \ (2) 

c?(s q ,siy, forC*, = 0, 
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where C% (S%) = (s? s|)g| is the spatially averaged (normalized by the number of nearest 
neighbor pairs in Gf) spin pair correlation of the q— level sample and Cj(S^, S% ) = 
( s i s j)G * s the spatially averaged spin pair correlation over the entire grid. Given the 
above, the estimation of S£ is equivalent to finding the optimal configuration S% that 
corresponds to the minimum of the cost function (J2J) at a fixed temperature T, i.e., 

= argmi % , U^S*), for q = 1, . . . , JV C . (3) 
3.5. Potts-based NN-MLC model (P-NN-MLC) 

In the P-NN-MLC model the classification is performed for all classes simultaneously. 
Hence, there is only a single simulation level irrespectively of the number of discretization 
levels. The grid spins S = {sj}, % = 1, . . . , N take values in the set Sj G {1, . . . , N c }. 
The cost function is given by 

1 - C P {S P , S s )/C P;s (S s )] \ for C P;s ^ 0, 
U P (S P \S S ) = { (4) 

Cp(S p , S s ) 2 } for Cp. s = 0, 

where Cp- s (S s ) = (S( SuSj ))G s is the spatially averaged (normalized by the number of 
nearest neighbor pairs in G s ) spin pair correlation of the sample and Cp(S p , S s ) = 
(fi(8i,sj))G ^ s ^ ne avera g e spin pair correlation over G. The estimation of S p is equivalent 
to finding the minimum of the cost function (jlj), i.e., 

S p = argmin 5 U P (S P \S S ). (5) 



4. Simulations of Missing Data on Regular Grids 

We focus on samples with missing data on regular grids. On such grids, it is 
straightforward to determine the nearest neighbors and calculate the correlation 
energies. Both the I-NN-MLC and P-NN-MLC methods return a class indicator field 
[see Eq. (prj)] I z = I Z {G S ) U Iz(G p ), which consists of the original sample classification 
and the class estimates at G p . The indicator values at the sampling sites are exactly 
reproduced. Below we refer to Iz(G s ) as the training set. Optimization of the cost 
functions (j2J) and (T4|) are performed using the Monte Carlo approach. The generation of 
new "trial" spin states is realized using the Metropolis algorithm at zero temperature. 

We use the rejection ratio defined by p = (^rejected states)/iV(G|), where N(G£) 
is the number of prediction points at the g-level, to determine the stopping criterion. 
More specifically, our simulations terminate if p = 1, i.e., if one complete sweep through 
the entire grid G q v does not produce a single successful update. Reaching the termination 
criterion may require several sweeps through the lattice, depending on the initial state. 



4-1. Greedy Monte Carlo 



The T = assumption implies that the stochastic selection of an energetically 
unfavorable spin configuration in the Metropolis step is not possible. Hence, the 
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candidate "spin" for the update is flipped unconditionally only if it lowers the cost 
function. This approach is called the "greedy" Monte Carlo algorithm [25] and leads to 
very fast convergence. In contrast, in simulated annealing T is slowly lowered starting 
from an initial high-temperature state. This approach is much slower computationally. 
However, the configuration resulting from simulated annealing is less sensitive to 
the initial state. The sensitivity of the greedy algorithm is known to be especially 
pronounced in high-dimensional spaces with non-convex energies. In such cases, the 
greedy algorithm is likely to be trapped in local minima, instead of converging to the 
global one. However, this is not a concern in the current problem. In fact, targeting the 
global minimum of U\ and U-p strongly emphasizes the sample correlation energy per 
"spin" pair. However, the latter is influenced by sample-to-sample fluctuations. 

On rectangular or square grids, further increase in computational efficiency is 
gained by taking advantage of the geometry and the nearest-neighbor interactions. 
This is achieved by splitting the grid into two interpenetrating subgrids, which allows 
vectorizing the algorithm. Hence, one sweep through the entire grid is performed in just 
two steps by simultaneously updating all the sites on one of the subgrids in each step. 



4-2. Simulation Algorithms 

Based on the above, the monte Carlo (MC) algorithms for the I-NN-MLC and P-NN- 
MLC models consist of the following steps: 

Algorithm for I-NN-MLC model 

(1) Initialize the indicator field on the entire grid by means of Iz(G) = NaN 

(2) Set the simulation level (class index) to q = 1 

(3) While [loop 1] q < N c — 1 discretize Z{G S } with respect to t q+ \ to obtain 

(3.1) Given the data S%, calculate the sample correlation energy Cj. s 

(3.2) Assign initial values to the spins at G q p , i.e., generate S q p ^ 

(3.3) Calculate the initial values of the simulated correlation Cj^ 
and the cost function ; initialize the MC index % = 1 

(3.4) Initialize the rejection ratio p — > 0; and the rejected states index i r = 

(3.5) While [loop 2] p < 1 repeat the following updating steps: 

(3.5.1) Generate a new state 5p by perturbing Sp^ 

(3.5.2) Calculate Cj {i+1) anduf +l) 

(3.5.3) IfUj l+1 ^ < Uj accept the new state; i r — > 0. 
else keep the "old" state; i r — > i r + 1; endif 

(3.5.4) p ^i r /N(G«); + 
end [loop 2] 

(3.6) Assign the —1 "spins" to the q level, i.e., Izd^i}) = q 
IfS^\{f l }) = -l,{f l }eG 

(3.7) Increase simulation level, q — > q + 1, return to step (3) 
end [loop 1] 

(4) For q = N c , Vr- (i = 1, . . . , N^) such that i z {{fi}) = NaN, set I z {{fi}) = N c . 
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In the above, the symbol NaN is used to denote non-numeric values. 

Algorithm for P-NN-MLC model 

(1) Discretize Z{G S } with respect to tk, k — 1, . . . , N c + 1 to obtain S s 

(2) Given the data S s , calculate the sample correlation energy Cp- S 

(3) Assign initial values to the spins at G p , i.e., generate Sf^ 

(4) Calculate the initial values of the simulated correlation Cp^ 
and the cost function U p °^ ; initialize the MC index i = 1 

(5) Initialize the rejection ratio p —>■ 0; and the rejected states index i r = 

(6) While p < 1 repeat the following updating steps: 

(6.1) Generate a new state Sp by perturbing Sp 

(6.2) Calculate Cp l) and U p i+1) 

(6.3) If Up +l ^ < Up* accept the new state; i r — > 0. 
else keep the "old" state; i r —>■ i r + 1; endif 

(6.4) p-+i r /N(G p ); z->i + l; 
end 

(7) Assign I z ({n}) = S^\{n}), {r t } £ G. 



4-3. Initial state selection 

The initial configuration of class indices can be selected in a number of ways. Since 
the proposed models aim to provide fast and automatic classification mechanisms, the 
initial configuration (assigned in steps (3.2) and (3) in the I-NN-MLC and P-NN-MLC 
algorithms) should minimize the relaxation path in state space to the equilibrium. It 
should also be selected with little or no user intervention. Since a degree of spatial 
continuity is common in geospatial data sets, it makes sense that the initial state of the 
individual prediction points is determined based on the sample states in their immediate 
neighborhood. On square grids, we determine the neighborhood of f p by an m x m 
stencil (where m = 21 + 1) centered at f p . Then, the initial value at a prediction point is 
assigned by majority rule, based on the prevailing value of its sample neighbors inside 
the stencil. If we considered a circular stencil, this method would correspond to the 
fc-NN classification algorithm, k being the number of sampling points inside the stencil. 

We set the stencil size adaptively to the smallest size that contains a clear majority 
of sample spin values, as shown schematically in Fig. ([1]). In practice, it makes sense 
to impose an upper limit on the stencil size m max . If no majority is established up to 
the maximum stencil size m max x m max , the initial value at f p is assigned randomly 
from the equally represented class indices with the highest frequency (in the I-NN-MLC 
this means ±1.) If majority is not reached due to absence of sampling points inside 
the maximum stencil, the initial value is drawn randomly from the entire range of 
the labels 1, . . . , iV c . There are sensible reasons for imposing a maximum stencil size. 
First, considering too large neighborhoods in the fc-NN classification generally generates 
oversmoothing at larger-scales that can not be justified as an effect of local continuity. 
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(a) (b) 



Figure 1. Schematic demonstrating the stencil selection. Arrows pointing up 
correspond to +1 spins, arrows pointing down to —1 spins, and the empty circles 
to the prediction location. In the first plot (a) a square stencil of size 7774=3 is used. 
Within the neighborhood marked by the dash-line square there is an equal number of 
+1 and —1 sites. In the second plot (b) a stencil of size 7772 = 5 is used to break the 
tie. 



Second, large neighborhoods increase the computational demands (both for memory 
and CPU time) disproportionately to expected benefits. Finally, assigning a portion of 
the prediction points initial values at random introduces a degree of randomness, which 
can be used to assess uncertainty by performing multiple runs on the same sample set. 
The choice of m max is arbitrary to an extent. Intuitively, for sparser sampling patterns 
larger m max should be considered. In our investigations, relatively small sizes (up to 
m max = 7) were sufficient to establish good statistical performance at relatively small 
computational cost. 

The algorithms thus require that only two parameters be set: m max and N c . The 
latter depends on the study's objectives: if the goal is to determine exceedance levels of 
a pollutant concentration with respect to a regulatory threshold, a binary classification 
is adequate. For environmental monitoring and decision making purposes a moderate 
number of classes (e.g., eight) is often sufficient. 



5. Case Study: Missing Data Reconstruction 

To test the classification models we use a synthetic pollutant concentration data set 
derived from a digital elevation model of the Walker lake area in Nevada [26]. A two- 



dimensional projection of the pollution field is shown in Fig. 2(a) The units used for the 
Z values are arbitrarily set to parts per million (ppm). Some summary statistics are as 
follows: number Nq = 78 000 on a 260 x 300 rectangular grid, z ni \ n = 0, z m3X = 1631.2, 
z = 277.9, 2 .50 — 221.3, a z = 249.9, the skewness coefficient is 1.02, and the kurtosis 



coefficient 3.78. As evidenced from the above statistics and the histogram in Fig. 2(b) 
the distribution is clearly neither Gaussian nor log-normal. 



5.1. Computational details 

^From the complete data we draw a sample Z{G S } (training set) of size N = (1— p)N^ by 
randomly removing P = pNq values (validation set), which are later used for prediction 
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Figure 2. Map and histogram of the original complete data. 



validation. For three degrees of thinning, p = [0.33, 0.50, 0.66], we generate 100 different 
configurations of the training and validation sets. The values at the prediction points 
are predicted (classified) using the I-NN-MLC and P-NN-MLC classification algorithms. 
The class indicator values at the prediction points Iz{G p ) are then compared with the 
classification estimates Iz(G p ). To evaluate the classification performance, we calculate 
the misclassification rate as a fraction of misclassified pixels: 

p 



p=i 



6(i z (f p ),i z (f p )) 



(6) 



where Iz(r p ) is the true value at the validation points, Izifp) is the classification 
estimate and 5(1, 1') — 1 if / = I' , 5(1, 1') — if I ^ I'. The standard deviation 
of the quantity F is evaluated using the values obtained from all the configurations, 

as STD^ = \J^2l^i(F* — F*) 2 /99. We also compare the class index variograms of the 
original and reconstructed patterns in the orthogonal lattice directions, defined by 

' ^ I z (fA hi * 



2\N t (h)\ 



E 



(7) 



i,j£N L (h) 

where N,,(h) denotes the set of pairs of points in i orthogonal lattice direction separated 
by distance h (lag), and \N L (h)\ denotes the number of pairs in the set. 

Furthermore, we record the CPU time, the number of MC sweeps needed to 
reach equilibrium, and the residual values of the cost functions at termination. The 
procedure is repeated for N c = 8 and iV c = 16. The computations are performed in the 
Matlab® programming environment on a desktop computer with 1.93 GB of RAM and 
an Intel®Core™2 CPU 6320 processor at 1.86 GHz. 
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(c) P-NN-MLC, p = 0.33 (d) P-NN-MLC, p = 0.66 

Figure 3. The reconstructed maps corresponding to the first realization obtained by 
(a) I-NN-MLC with p = 0.33, (b) I-NN-MLC with p = 0.66, (c) P-NN-MLC with 
p = 0.33, and (d) P-NN-MLC with p = 0.66. 

5.2. Analysis of Missing Data Reconstruction Results 

In Figs. [3]-[7]the classification performance of the two spin-based models is demonstrated 
for two limiting cases: one with a low number of levels and low degree of thinning 
(N c = 8 and p = 0.33), and the other one with a high number of levels and high degree 
of thinning (N c = 16 and p = 0.66). The plots in these figures include: in Fig. El 
a 2D projection of the reconstructed isolevel map (shown for the first realization), in 
Fig. HI the spatial distribution of the class index standard deviations based on the 100 
realizations, in Fig. El the histograms of the original data as well as the best (lowest 
F*) and worst (highest F*) reconstructions, and in Figs. El and the variograms (along 
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(c) P-NN-MLC, p = 0.33 (d) P-NN-MLC, p = 0.66 



Figure 4. The standard deviations calculated based on the classification results 
obtained from 100 realizations obtained by (a) I-NN-MLC with p = 0.33, (b) I-NN- 
MLC with p = 0.66, (c) P-NN-MLC with p = 0.33, and (d) P-NN-MLC with p = 0.66. 

the directions of coordinate axes) of the original data versus those of the best and worst 
reconstructions. In all cases, there is good visual agreement between the spatial patterns 
of the original data and the reconstructions. However, at higher values of p = 0.66 and 
N c — 16, closer inspection of the maps can reveal some degree of smoothing and small 
speckles of misclassified pixels. The higher misclassification in the p = 0.66 and N c = 16 
case is also manifested in the remaining quantities, in particular, poorer matching of the 
histograms, and larger deviations of the reconstructions' variogram curves with respect 
to the original. 

The histograms are displayed in log-lin scale in order to better visualize also the 
small frequency classes (in the tail). The natural logarithm of the class frequencies, iVj 
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Levels 

(a) I-NN-MLC, p = 0.33 




Levels 

(b) I-NN-MLC, p = 0.66 




Figure 5. The log- histograms of the original, the best (F^n), and the worst (i^as) 
reconstructed data, obtained by (a) I-NN-MLC with p = 0.33, (b) I-NN-MLC with 
p = 0.66, (c) P-NN-MLC with p = 0.33, and (d) P-NN-MLC with p = 0.66. 



(i = 1, . . . ,N C ) is used. On the other hand, the logarithmic scale somewhat visually 
suppresses the differences in the high frequency classes. Nevertheless, in the I-NN-MLC 
model we can observe a systematic underestimation of the highest-frequency (first) 
class and the low- frequency classes (especially at larger p). Their underestimation 
(and overestimation of the classes closer to the mean J^ 6 « 3.3) is reflected in the 
noticeable decrease in the class index variance of the reconstructed maps (as shown by 
the variogram plots). On the other hand, for the P-NN-MLC model, the frequencies of 
the most represented classes are reconstructed reasonably well, and the classes in the 
tail only represent a small portion of the total data (e.g., for N c = 16 the classes larger 
than 13 represent less than 0.1% of Nq) and, therefore, the variation in their frequencies 
have relatively little impact on the variograms. 

The misclassification rate and its standard deviation of the two algorithms are 
compared in Table [U In terms of the misclassification rate, the I-NN-MLC model 
performs uniformly better than the P-NN-MLC model. For the current set the 
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(c) P-NN-MLC, p = 0.33 (d) P-NN-MLC, p = 0.66 



Figure 6. The x-axis direction variograms of the original, the best (F^ in ), and the 
worst {F^ ax ) reconstructed data, obtained by (a) I-NN-MLC with p = 0.33, (b) I- 
NN-MLC with p = 0.66, (c) P-NN-MLC with p = 0.33, and (d) P-NN-MLC with 
p = 0.66. 

differences are not large, but they can be significant for different data (see I5.4I below). 
Comparing the proposed spin models with the fc-NN classifier, both models gave 
uniformly smaller misclassification rates than the best achievable by the fc-NN algorithm. 

5. 3. Computational performance of classification methods 

The computational performance of the proposed spin classifiers is compared to the 
fc-NN classifier also in Table [H Due to binary values of the Ising spins, the I-NN- 
MLC model requires a very small number of Monte Carlo sweeps over the grid to 
reach equilibrium. In the most "difficult" case (N c = 16 and p = 0.66), it takes less 
than 20 lattice sweeps. This implies short optimization CPU times at each level. A 
substantial fraction of the total CPU time is spent for the initial state assignments at 
each level. For the P-NN-MLC model the initial state is determined once. On the other 
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Figure 7. The y-axis direction variograms of the original, the best (i 7 ^^), and the 
worst (F^ax) reconstructed data, obtained by (a) I-NN-MLC with p = 0.33, (b) I- 
NN-MLC with p = 0.66, (c) P-NN-MLC with p = 0.33, and (d) P-NN-MLC with 
p = 0.66. 



hand, due to the significantly larger configuration space, the relaxation is much slower 
than for the I-NN-MLC model. Nevertheless, it is accomplished within maximum 32 
(fastest) and 85 (slowest) MC sweeps in less than 2 (fastest) and 8 (slowest) seconds 
of CPU time. Optimizing the fc-NN algorithm involved time-consuming multiple runs 
for each realization to test a wide range of k values, leading to considerably higher 
CPU times (not reported). In practical applications, the optimal value of k is often 
selected by heuristic techniques (e.g., cross-validation), which also require user input and 
computational resources. Overall, the I-NN-MLC and P-NN-MLC models can provide 
better classification accuracy more efficiently and without user intervention. 
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Table 1. The mean values of the misclassification rate (F*) and the standard 
deviations STD F , obtained by the I-NN-MLC and P-NN-MLC models are compared 
with the best results obtained by the fc-NN classification ((F£ nn ) and STDj?* ). The 
additional statistics for the I-NN-MLC and P-NN-MLC models include: the mean 
numbers of Monte Carlo sweeps (Nmc), the mean values of the CPU time (T cpu ), and 
the mean values of the cost function at termination (U*). The averaging is performed 
over 100 realizations. 



# of classes 






8 classes 










16 classes 






P 


0.33 


0.5 


0.66 


0.33 


0.5 


0.66 


0.33 


0.5 


0.66 


0.33 


0.5 


0.66 


Model 










k Nearest Neighb 


ors 










(Hun) [%] 


22.6 


24.0 


25.9 


22.6 


24.0 


25.9 


39.7 


41.4 


43.4 


39.7 


41.4 


43.4 


STDf* 

kn.n. 


0.24 


0.20 


0.16 


0.24 


0.20 


0.16 


0.31 


0.22 


0.18 


0.31 


0.22 


0.18 


Model 


I-NN-MLC 


P-NN-MLC 


I-NN-MLC 


P-NN-MLC 


<F*> [%] 


21.4 


22.6 


24.2 


21.7 


22.9 


24.5 


37.1 


39.0 


41.7 


38.1 


39.8 


41.8 


STD F . 


0.23 


0.19 


0.17 


0.23 


0.20 


0.17 


0.28 


0.22 


0.21 


0.29 


0.21 


0.17 


(N M c) 


5.9 


7.5 


9.1 


31.8 


35.9 


41.4 


13.8 


17.4 


19.4 


65.8 


74.6 


85.0 


(Tcpu) [s] 


2.62 


3.21 


3.87 


1.97 


3.04 


4.28 


5.31 


6.29 


7.15 


3.37 


5.31 


7.45 


(U*) 


5e-4 


le-3 


2e-3 


le-3 


2e-3 


2e-3 


4e-4 


8e-4 


le-3 


3e-3 


6e-3 


6e-3 



5.4- Reconstruction of synthetic Gaussian random field 

To further investigate differences in the classification performance between the I-NN- 
MLC and P-NN-MLC models, we generated smooth synthetic data on a 50 x 50 
grid. The data represent a realization (see Fig. [8]) from a Gaussian random field 
Z ~ N(m = 50, a = 10) with Whittle-Matern correlations [27]. The correlation 
function is c(r) = cr 2 YWj( Kr ) V Kv( Kr )i where v = 2.5 is the smoothness parameter, 
k = 0.2 is the inverse correlation length, and K u is the modified Bessel function of 




10 20 30 40 50 



Figure 8. Synthetic random field with a Gaussian distribution Z ~ N(m = 50, a = 
10) and Whittle-Matern type correlations (v = 2.5 and k = 0.2). 
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order v. The biggest difference in classification accuracy between the two models is 
obtained for N c = 16 and p = 0.33: -Fp_ NN __ M LC = 35.1%, while -Fi*-nn-mlc = 21.2%. 
We believe that the superior performance of the I-NN-MLC model results from the 
sequential strategy, in which points classified as —1 at lower levels are included in the 
sampling set at higher levels. Provided that the classification at lower levels is accurate, 
which is more likely for rather smooth and noise-free data (like the synthetic ones shown 
above), the included estimates can significantly improve the model's performance. The 
sequential algorithm also reduces potential state degeneracy (i.e., spin configurations 
with the same energy). This feature is likely to occur in the spin models and it increases 
ambiguity in the identification of a particular spin configuration from the correlation 
energy. At the same time, the propagation of classification results from lower to higher 
levels can also be a weakness of the sequential algorithm, since low level classification 
errors influence the higher levels. 

6. Conclusions and future research 

We presented non-parametric approaches for spatial classification, inspired from the 
Ising and the Potts spin models, with a sequential and simultaneous classification 
strategy, respectively. The concept is based on the idea of matching the normalized 
correlation energies calculated from discretized data over the sampling grid and the 
entire area of interest. The matching is performed using Monte Carlo simulations, 
conditional on the sample values. The main advantage of the models is that 
they do not have any hypeparameters that need tuning; hence the classification is 
automatic, objective, competitive (in accuracy) and computationally efficient. The 
proposed methods are applicable to non-Gaussian distributions. In addition, they can 
incorporate non-stationary data, because even with a constant coupling strength the 
spin interactions imply a local impact of the sample values. 

The future research includes the extension to scattered sampling patterns. One 
possible way is to define the interaction constant in the Hamiltonians (I3.1H3.2I) 
through a kernel function (such as the radial basis function), and the interaction 
neighborhood (nearest neighbors), for example, as pairs of points whose Voronoi cells 
have a common boundary. Another way is to first use a simple interpolation method 
to place the irregularly spaced points on a regular grid with a specified resolution and 
then proceed as in the current study. The latter approach would allow vectorization 
and preserve the computational efficiency Further possible extensions of the current 
models could also include further-neighbor or/and "higher-order" (e.g., three-point) 
correlation energy in the respective Hamiltonians. We could expect some elimination of 
the degeneracy, witnessed in the present models, and more faithful characterization of 
the nature of the spatial dependance. Both effects should contribute to the improvement 
of the classification performance. It would also be interesting to consider data sets with 
different patterns of missing data and investigate the effect of various gap patterns. 
Finally, it remains to be seen if the proposed methods can be used in the case of data 
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sets containing a small number of extreme values, for example, two or three unusually 
elevated values detected by a monitoring network in the case of a radioactivity release. 
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